MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm
نویسندگان
چکیده
BACKGROUND Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. RESULTS We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. CONCLUSIONS The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.
منابع مشابه
Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.
Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new...
متن کاملGroopM: an automated tool for the recovery of population genomes from related metagenomes
Metagenomic binning methods that leverage differential population abundances in microbial communities (differential coverage) are emerging as a complementary approach to conventional composition-based binning. Here we introduce GroopM, an automated binning tool that primarily uses differential coverage to obtain high fidelity population genomes from related metagenomes. We demonstrate the effec...
متن کاملQuantitative SPECT and planar 32P bremsstrahlung imaging for dosimetry purpose –An experimental phantom study
Background: In this study, Quantitative 32P bremsstrahlung planar and SPECT imaging and consequent dose assessment were carried out as a comprehensive phantom study to define an appropriate method for accurate Dosimetry in clinical practice. Materials and Methods: CT, planar and SPECT bremsstrahlung images of Jaszczak phantom containing a known activity of 32P were acquired. In addition, Phanto...
متن کاملCOCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge
Motivation The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functiona...
متن کاملPii: S0031-3203(01)00133-9
We present a novel method for representing “extruded” distributions. An extruded distribution is an M -dimensional manifold in the parameter space of the component distribution. Representations of that manifold are “continuous mixture models”. We present a method for forming one-dimensional continuous Gaussian mixture models of sampled extruded Gaussian distributions via ridges of goodness-of-#...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2 شماره
صفحات -
تاریخ انتشار 2014